A Hybrid Approach for Extractive Document Summarization Using Machine Learning and Clustering Technique

نویسندگان

  • M. S. Patil
  • M. S. Bewoor
  • S. H. Patil
چکیده

Usually, presence of the same information in multiple documents is the main problem faced in effective information access. Instead of this redundant information thus accessed or retrieved, users are interested in retrieving information that addresses one or other several aspects. In such situation, text summarization proves to be very useful. Not only in Information retrieval, but it is an extremely active research topic in other fields like natural language processing and machine learning. Text summarization is a process of extracting content from a document and generating summary of that document thus presenting important content to user in a relatively condensed form. In this paper, study of several extractive text summarization approaches is made and an effective text summarization method is proposed. This method is based on Support-Vector-Machine (SVM). Proposed system tries to improve the performance and quality of the summary generated by the clustering technique by cascading it with SVM. Keywords— clustering, document summarization, extractive text summarization, machine learning, SVM.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature expansion for query-focused supervised sentence ranking

We present a supervised sentence ranking approach for use in extractive summarization. Using a general machine learning technique provides great flexibility for incorporating varied new features, which we demonstrate. The system proves quite effective at query-focused multi-document summarization, both for single summaries and for series of update summaries.

متن کامل

Text Summarization Using Cuckoo Search Optimization Algorithm

Today, with rapid growth of the World Wide Web and creation of Internet sites and online text resources, text summarization issue is highly attended by various researchers. Extractive-based text summarization is an important summarization method which is included of selecting the top representative sentences from the input document. When, we are facing into large data volume documents, the extr...

متن کامل

Hybrid Approach for Single Text Document Summarization Using Statistical and Sentiment Features

Summarization is a way to represent same information in concise way with equal sense. This can be categorized in two type Abstractive and Extractive type. Our work is focused around Extractive summarization. A generic approach to extractive summarization is to consider sentence as an entity, score each sentence based on some indicative features to ascertain the quality of sentence for inclusion...

متن کامل

Survey on Extractive Text Summarization Approaches

Due to increasing use of internet and online technologies or online data, there is vast increase in the electronic documents. When a data is being retrieved from such a huge collection of electronic documents, hundreds and thousands of documents are retrieved. Hence, for user, it is not possible to read all the retrieved documents. Also, these documents contain redundant information. In such si...

متن کامل

A Hybrid Approach to Multi-document Summarization of Opinions in Reviews

We present a hybrid method to generate summaries of product and services reviews by combining natural language generation and salient sentence selection techniques. Our system, STARLET-H, receives as input textual reviews with associated rated topics, and produces as output a natural language document summarizing the opinions expressed in the reviews. STARLET-H operates as a hybrid abstractive/...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014